In this exercise, we will be using functions from the tidyverse package. You can see we’ve added the chunk option message = FALSE to hide the version information that tidyverse normally displays.

library(tidyverse)

(a) Make a scatter plot with a line of best fit

Load the afl_grand_finals.csv dataset we looked at previously.

Make a scatterplot of year vs winner_score.

Add a LOESS smoother or line of best fit.

afl_grand_finals <- read_csv("afl_grand_finals.csv")
ggplot(afl_grand_finals, aes(x = year, y = winner_score)) +
  geom_point() +
  geom_smooth(method = "loess")
Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 3 rows containing missing values (`geom_point()`).

(b) Make a line-and-dot chart

Here is a small dataset from the Victorian Electoral Commission, showing the number of lower house seats won by major parties in the 2018 state election:

election_data <- tribble(
                    ~party, ~seats_won,
       "Australian Greens",          3,
  "Australian Labor Party",         55,
                 "Liberal",         21,
           "The Nationals",          6,
        "Other Candidates",          3
)

First make a bar chart of it. You will need to specify which column to place on which axis, and use geom_col() instead of geom_bar(), since the data is already in summarised form (one observation per bar).

ggplot(election_data,
       aes(x = seats_won, y = party)) +
  geom_col()

Next make a line-and-dot chart, similar to the one demonstrated in the slides for Five principles of good graphics. (What do you think geom_segment() does? Look it up in R’s help system to confirm your guess.)

ggplot(election_data,
       aes(x = seats_won, y = party)) +
  geom_point() +
  geom_segment(aes(xend = 0, yend = party))

(c) Extension: Improve your scatterplot

Take the scatterplot you made earlier and improve it to a standard you would be comfortable sharing with others.

  • Add appropriate axis labels, a title, and a caption indicating what your line displays.
  • Improve the x axis scale. ggplot usually provides sensible axis labels, but sometimes fails. In our case, the number “2020” is cut off slightly. You can control the x axis using scale_x_continuous(). Some parameters you can experiment with include limits = c(LOWNUMBER, HIGHNUMBER) (where LOWNUMBER and HIGHNUMBER are the ranges of the scale) and breaks = seq(LOWNUMBER, HIGHNUMBER, by = INCREMENT) which sets where the tick marks go on the axs. (breaks = is new, but we saw seq() briefly back in section 1.2!)
  • Extend the y axis to include zero. (Is there any reason to do this here? Some people advocate all axes going to zero, but this is a complex issue that deserves consideration for each plot you make.)
  • Apply your favourite theme.
  • Change the colour of the points, smoother line, and 95% confidence interval for the smoother line by giving colour = and fill = options to geom_point() and geom_smooth(). You can use hexadecimal colour codes (like HTML), or one of the named colours here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf Colour names need to go inside quotes.
  • Extension: what happens if you swap the order of geom_point() and geom_smooth(), if the points are not black? Look carefully at the shaded confidence region.
ggplot(afl_grand_finals, aes(x = year, y = winner_score)) +
  geom_smooth(method = "loess",
              colour = "deepskyblue4", fill = "deepskyblue1") +
  geom_point(colour = "firebrick4") +
  scale_x_continuous(limits = c(1898, 2020),
                     breaks = seq(1900, 2020, by = 20)) +
  scale_y_continuous(limits = c(0, 200)) +
  labs(x = "Year", y = "Score of winning team",
       title = "AFL Grand Final scores over time",
       caption = "Blue trend line shows LOESS smoother.") +
  theme_bw() +
  theme(plot.title.position = "plot",
        panel.grid.minor.x = element_blank(),
        panel.grid.minor.y = element_blank())
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).

(d) Extension: Improve your bar or line-and-dot plot

Pick your favourite of the plots of the Victorian election data, and get it to a standard you would be happy sharing with others.

Some suggestions:

  • Sort the parties in order of number of seats won. Hint: fct_reorder(party, seats_won, .desc = TRUE) should feature in your solution.
  • Add appropriate axis labels.
  • Add party-appropriate colours. Hint: these are hexadecimal codes for the colours used by Australia’s major political parties: c("#DE3533", "#0047AB", "#006644", "#10C25B", "#808080") in the order: ALP, Liberal, Nationals, Greens, Other. (sourced from Wikipedia!)
  • If you add colours, where should the legend go? Sometimes theme(legend.position = "off") is the best place.
  • Add a title and a caption indicating the source of the data.
  • Pick your favourite theme. Think about what gridlines are needed.
election_data_sorted <- election_data %>%
  mutate(party = fct_reorder(party, seats_won, .desc = TRUE))
ggplot(election_data_sorted,
       aes(x = seats_won,
           y = party,
           colour = party)) +
  geom_segment(aes(xend = 0, yend = party)) +
  geom_point() +
  scale_x_continuous(expand = expansion(mult = c(0, 0.1))) +
  scale_y_discrete(limits = rev) +
  scale_colour_manual(values = c("#DE3533", "#0047AB",
                                 "#006644", "#10C25B",
                                 "#808080")) +
  labs(x = "Number of seats won",
       y = "Party",
       title = "Victorian state election 2018 lower house results",
       caption = "Data source: Victorian Electoral Commission") +
  theme_minimal() +
  theme(legend.position = "off",
        plot.title.position = "plot",
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_blank())


© 2021 Statistical Consulting Centre, The University of Melbourne.